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Coloured Image Assessment 

This invention relates to a method, an apparatus and a computer program far 
assessment of coloured images: it is particularly (although not exclusively) relevant to 
assessment of histological slides to provide clinical information on potentially canoerous 
5 tissue such as breast cancer tissue. 

Breast cancer is a common farm of female cancer once a lesion indicative of breast 
cancer has been detected, tissue samples are taken and examined by a histopathologist 
to establish a diagnosis, prognosis and treatment plan. However, pathological analysis of 
tissue samples is a time consuming and inaccurate process. It entails interpretation of 

10 colour images by human eye, which is highly subjective: it is characterised by 
considerable inaccuracies in observations of the same samples by different observers 
and even by the same observer at different times. For example, two different observers 
assessing the same ten tissue samples may easily give different opinions for three of the 
slides - 30% error. The problem is exacerbated by heterogeneity, i.e. complexfty of some 

15 tissue sample features. Moreover, there is a shortage of pathology staff. 

There is a need to provide an objective measurement of Cert>-B2, ER and PR status and 
vascularity to contribute to an effective treatment plan for the patient 

In order that.the Invention might be more fully understood, embodiments thereof will now 
be described, by way of example only, with reference to the accompanying drawings, in 
20 which :- 

Figure 1 Is a block diagram of a procedure for measuring indications of cancer to 
assist in formulating diagnosis and treatment; 

Figure 2 is a block diagram of a process for measuring Cerb-B2 In the procedure of 
Figure 1 ; 

25 Figure 3 is a block diagram of a process for measuring vascularity in the procedure of 
Figure 1; 

Figure 4 is a block diagram of a process for measuring oestrogen and pnjgesterone 
receptor in the procedure cf Figure 1 ; 



Figure 5 

30 



is a pseudo three dimensional view of a red, green and blue colour space 
(colour cube) plotted on respective orthogonal axes; 
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figure 6 is a transformation of Figure 5 to form a chramatioity space; 
Figure 7 is a drawing of a chromaticity space reference system; and 
Rgure S illustrates use of polar co-ordinates. 

The examples of Invention to be described herein are three different inventions which 
5 can be implemented separately or together, because they are all measurements which 
individually or collectively assist a clinician to diagnose cancer and to formulate a 
treatment programme. In descending order of importance, the procedures are 
determination of oestrogen and progesterone receptor, determination of Cerb-B2 and 
determination of vascularity. For convenience however, they are described in the order 
10 determination of Cerb-B2, determination of vascularity and determination of oestrogen 
and progesterone receptor. 

A procedure 10 for the assessment of tissue samples in the form of histopathological 
slides Of potential carcinomas of the breast is shown in Figure 1- This drawing illustrates 
processes, which generate measurements of specialised kinds for use by a pathologist 
15 as the basis for assessing patient diagnosis, prognosis and treatment plan. 

The procedure 10 employs a database, which maintains digitised image data obtained 
from histological slides as will be described later. Sections are taken (cut) from breast 
tissue samples (biopsies) and placed on respective slides. Slides are stained using a 
staining agent selected from the following depending on which parameter is to be 
20 determined: 

a) immunohistochemical staining for Cerb-B2 with diaminobenzidine (DAB) as 
substrate (chemical staining agent) - collectively H Cerb-DAB 1? - this is for 
assessing Cerb-B2 gene amplification status; 

b) Oestrogen receptor (ER) with DAB as substrate (collectively "ER-DAB") for 
25 . assessing the expression (the amount expressed or emitted) of the oestrogen 

receptors. Progesterone receptor (PR) status is investigated using chemical 
treaqtment giving the same colouration as in ER. 

c) Immunohistochemical staining for CD31 with fuchsin (F) as. substrate for 
assessing vascularity (angiogenesis). 
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In a prior art manual procedure, a clinician places a slide under a microscope and 
examines a region of tt (referred to as a file) at magnification of x40 for Indications of 
Cerb-B2, ER and PR status and vascularity. 

The present invention requires data from histological slides in a suitable form. In the 
5 present example, image data were obtained by a pathologist using Zeiss Axioskop 
microscope with a Jenoptiks Progres 3012 digital camera. Image data from each slide is 
a set of digital images obtained at a linear magnification of 40 (Le. 40X), each image 
being an electronic equivalent of a tile. 

To select images, a pathologist scans the microscope over a slide, and at 40X 
10 magnification selects regions (tiles) of the slide which appear to be most promising in 
terms of an analysis to be performed. Each of these regions is then photographed using 
the microscope and digital camera referred to above, which produces for each region a 
respective digitised Image in three colours, i.e. red, green and blue (R, Q & B). Three 
intensity values are obtained for each pixel in a pixel array to provide an image as a 
15 combination of R, G and B image planes. This image data is stored temporarily at 12 for 
later use. 

Two tiles are required for vascularity measurement at 14, and one tile for each of 
oestrogen and progesterone receptor measurement at 1 6 and Cerb-B2 measurement at 
18. These measurements provide input to a diagnostic report at 20. 

20 The prior art manual procedure for scoring Cerb-B2 involves a pathologist subjectively 
and separately estimating stain intensity, stain location and relative number of cells 
associated with a feature of interest in a tissue sample. The values obtained in this way 
are combined by a pathologist to give a single measurement for use in diagnosis, 
prognosis and reaching a decision on treatment The process hereinafter described in 

25 this example replaces the prior art manual procedure with an objective procedure. 

Referring now to Figure 2. a flow diagram of the Cerb-B2 measurement process 18 is 
shown In more detail: the process 18 is applied to one image or tite obtained by 
magnifying by a factor of 40 an area of a histological slide. The image is a three colour 
or RGB image as defined above, i.e. there is a respective image plane for each colour, 
30 For the purposes of the following analysis, the letters R, G and B for each pixel are 
treated as the red green and blue intensities at that pixel. The RGB input image is used 
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at 30 to compute a cyan image derived from the blue and green image planes: i.e. for 
each pixel a cyan intensity C is computed from, C = (2 x B + G)/3, the respective pixel's 
green (G) intensity being added to twice its blue (B) intensity and the resulting sum being 
divided by three. When repeated for all pixels this yields a cyan image or image plane. 

5 At 32, a Sobel edge fitter is applied to the cyan image plane: this is a standard image 
processing technique published In Klette R., & Zamperoni P., Handbook of image 
processing operators', John Wiley & Sons, 1995. A Sobel edge fitter consists of two 3x3 
arrays of numbers S P and Sq, each of which is convolved with successive 3x3 arrays of 
pixels in an image. Here 



10 Sj> = 



" 1 2 1 
0 0 0 
-1 -2 -1 



and S Q = 



1 0 -1 

2 0-2 
1 0 -1 



d> 



The step 32 initially selects a first cyan 3x3 array of pixels in the top left hand corner of 
the cyan image: designating as Cij a general cyan pixel in row i and column J, the top left 
hand corner of the image consists of pixels On to Cia. C ai to Cea and C31 to C33. Cij Is 
then multiplied by the respective digit of Sp located In the S P array as Cij is in the 3x3 
15 cyan pixel array: i.e. Cn to C13 are mutiplled by 1, 2 and 1 respectively, C21 to Ca 5 by 
zeroes and C 31 to Caa by -1, -2 and -1 respectively. The products so formed are added 
algebraicly and provide a value p. 

The value of p will be relatively low for pixel values changing slowly between the first and 
third rows either side of the row of Ca, and relatively high for pixel values changing 

20 rapidly between those rows: in consequence p provides an indication of image edge 
sharpness across rows. This procedure is repeated using the same pixel array but with 
Sq replacing S P , and a value q is obtained; q is relatively low for pixel values changing 
slowly between the first and third columns either side of the column of C22, and relatively 
high for pixel values changing rapidly between those columns: and q therefore provides 

25 an indication of image edge sharpness across columns. The square root of the sum of 
the squares of p and q are then computed \. B .^p z +q- , which is defined as an "edge 
magnitude" and becomes (replacing pixel C22 at the centre of the 3X3 array) in the 
transformed cyan image. It is also possible to derive an edge "phase angle" as tan'Vq. 
but that is not required in the present example. 
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A general pixel T 5 (row i, column j) in the transformed Image is derived from Cm j., to G- 
u+i. Cij.i to 0,^1 and Cmjo to Qn W of the cyan image. Because the central row and 
column of the Sobel filters in Equation (1) respectively are zeros, and other coefficients 
are 1 s and 2s, p and q for T 8 can be calculated as follows: 

5 p - { Cm j-i + 2d., j + Cm,j+i} - { Cm j»i + 2 C i*i ,\ + Ci+i j+i} (2) 
q = { Cu 1mH + 2C it & + Cm.^} - { C MllM + 2C ^+ C } (3) 

Beginning with i m j = 2, p and q are calculated for successive 3X3 pixel arrays by 
incrementing j by 1 and evaluating Equations (2) and (3} for each such array untif the end 
of a row fs reached; j is then incremented by 1 and the procedure is repeated for a 
10 second row and so on until the whole image has been transformed. This transformed 
image is referred to below as the "Sobel of Cyan" image or image plane. 

The Sobel filter cannot calculate values for pixels at image edges having no adjacent 
pixels on one or other of its sides: i.e. in a pixel array having N rows and M columns, 
edge pixels are the top and bottom rows and the first and last columns, or in the 
15 transformed image pixels T„ to T nM , T N1 to T N m, T n to T 1M and T 1M to T NM . By convention 
in Sobel filtering these edge pixels are set to zero. 

« 

The next step 34 is to compute the mean and standard deviation of the transformed pixel 
values T 5 . For convenience a change of nomenclature is implemented: index k is 
substituted for i and j, I.e. k = 1 to NM for i,j = 1, 1 to N, M; this treats a two dimensional 
20 image as a single composite line composed of successive rows of the image. Also x is 
substituted for T in each pixel value, so Tj becomes x*. The following Equations (4) and 
(5) respectively are used for computing the mean p, and standard deviation a of the 
transformed pixels x k . 



1 «5C 
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At 36, various statistical parameters are computed for the Red, Green, Blue and Cyan 
image planes using Equations (4) and (5) above and Equation (6) below. 



Skewness 




For the Red image plane the statistical parameters are the mean ^ standard deviation or 
5 and skewness of its pixel values: in Equations (4) to (6) x« (k= 1 to NM) represents a 
general pixel value in the Red image plane. In addition, the Red image plane's pixels are 
compared wEth one another to obtain their maximum, minimum and range (maximum — 
minimum). Similarly, pixels in each of the Green and Blue image planes are compared 
with one another to obtain a respective maximum, minimum and range for each of these 
10 planes. Finally, for the Cyan image, pixels' mean and standard deviation are computed 
using Equations (4) and (5), in which x k represents a general pixel value in the Cyan 
image plane. 

At 38, a cofour segmentation is computed from the Red, Green, Blue, Cyan and Sobel of 
Cyan image planes. Step 38 is shown in more detail within chain lines 40. Colour 
15 segmentation is implemented by applying three sets of adaptive thresholding operations 
referred to as D, E and F respectively. Each set of operations involves two respective 
numerical factors: D has factors D1 =0,802 and 02 = 1.24, E has E1 = E2 = 0.903 t and 
F has F1 = 1 .24 and F2 = 0.802. 

The first set of thresholding operations is D using factors D1 and D2. It implements the 
20 following steps: 

(a) Produce a threshoided image for the Red image plane as follows: for every Red 
pixel value that is less than an adaptive threshold, set the corresponding pixel 
location in the threshoided image to 1, otherwise set the latter to 0. A respective 
adaptive threshold is computed separately for every pixel location as follows: 

25 0) Select the same pixel location in the cyan image and from it search in four 

directions - the north, south, east and west directions - for seventy pixel 
locations (or as many as are available up to seventy)* Here north, south, east 
and west have the following meanings: north: upwards from the pixel in the 
same column; south: downwards from the pixel in the same column; east:- 
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rightward from the pixel in the Same row; and wesb leftward from the pixel in 
the same row. More directions could be used to improve accuracy but four 
have been found to be adequate for the present process. If any of the 
seventy locations in a respective direction has a pixei value less than the 
5 product of D2 and the Cyan mean pc then the direction is assigned a value of 

1, otherwise it is assigned a value of 0. 

(Si) Sum all four direction values to provide a sum and set the adaptive 
threshold for the current pixel location to whichever is the lesser of 265 and 
(Dl x Red mean Mr) + (S d - 4)'x Red standard deviation o R ). 

10 (b) Produce a thresholded Image for the Cyan Image plane as follows: using the 

Cyan mean He from step 34, for every Cyan pixel value that Is less than (D2 x \io), 
set the pixel in the corresponding location in the threshoided image to 0, otherwise 
set the latter to 1 . This has the effect of removing excess brown pixels. 

(c) Produce a threshoided image for the transformed or Sobel of Cyan Image plane 
15 as follows: using the Cyan mean nc and standard deviation cr c from step 34: I.e. for 

every Cyan pixel value that is greater than (fie + 1 -Sa c ) set the corresponding pixel 
in the threshoided image to 0, otheiwise set the latter to 1- This has the effect of 
removing excess edge pixels. 
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(d) Using the pixel minimum and range values computed at step 36, a single 
threshoided binary image is now produced using data obtained from the Red, 
Green and Blue image planes. This is carried out as follows: for each Red, Green 
and Blue pixel group at a respective pixel location that satisfies all three criteria at 
(i) to (iii) below, set the pixel at the corresponding location in the threshoided image 
to 0, otherwise set the latter to 1 ; this has the effect of removing lipid image regions 
(regions of fat which appear as highly saturated white areas). The criteria for each 
co-located Red, Green and Blue pixel group are: 

(i) RBd pixel > Red minimum + 0.98 x (red range), AND 

(ii) Green pixel > Green minimum + 0-9$ x (green range), AND 

(iii) Blue pixel > Blue minimum + 0.98 x (blue range) 
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15 



20 



25 



(e) The next step Is to apply to the binary image obtained at 38d above a series of 
morphological closing operations, which consist of a dilation operation followed by 
an erosion operation. These morphological operations are to fuse narrow gaps and 
eliminate small holes in Individual groups of contiguous pixels appearing as blobs 
fn an image. It can be thought of as removal of irregularities or spatial 'Yioise", and 
it is a standard image processing procedure published in Umbaugh S.C., 'Colour 
vision and image processing 1 , Prentice Hall, 1998. The number of operations 
applied depends on circumstances: operations (i.e. filling small gaps and/or holes) 
terminate when the number of changes made by the erosion operation is greater 
than the number made by the dilation operation. 

(f) A connected component labelling process is now applied to the binary image 
produced at 38e. This is a known image processing technique (sometimes referred 
to as 'blob colouring") published by R Klette and P Zamperoniu, 'Handbook of 
Image Processing Operators 9 , John Wiley & Sons, 1996, and A Rosenfeld and A C 
Kak, 'Digital Picture Processing 1 , Vols. 1 & 2, Academic Press, New York, 1982. It 
gives numerical labels to "blobs" in the binary image, blobs being regions or groups 
of like-valued contiguous or connected pixels in an image: i.e. each group or blob 
consists of connected pixels which are all 1s, and each is assigned a number 
different to those of other groups. This enables individual blobs to be distinguished 
from others by means of their labels. The number of labelled image regions or 
blobs in the image is computed from the labels and output. Connected component 
labelling also determines each labelled image region's centroid (pixel location of 
region centre) and area, which are used later as will be described. 

As indicated by arrows 42, the second and third sets of thresholding operations E and F 
are carried out by iterating steps 38a to 38f above using factors E1/E2 and F1/F2 
respectively. 

(g) A technique referred to as the Downhill Simplex method is now applied to the 
results of the thresholding operations obtained from 38a to 38f above. It is a 
standard iterative statistical technique for multidimensional optimisation published 
in Nelder J.A., Mead Ft, 1965, Computer Journal, vol. 7, pp 308-313, 1965. In the 
present example the technique is used to determine which pair of factors D1/D2, 
E1/E2 or F1/F2 used above yields the maximum number of image regions or 
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groups of connected pixels. It takes as input the three pair of factors D1/D2, E1/E2 
and F1/F2 together with the function returning the number of detected regions and 
described at 38a and 38b above for adaptive thresholding. 

The procedure is now concerned with determining quantities referred to as "grand mean" 
5 and "mean range" to be defined later. If the Downhill Simplex method 3Bg has 
determined that there are not more than a user specified number of Image regions, 
sixteen in the present example, then at 44 processing is switched to 46: here both grand 
mean and mean range are set to 1 .0 and processing passes to step 54 (to be described 
later) to compute a required result. 

If the Downhill Simplex method has determined that there are more than sixteen image 
regions, then at 44 processing is switched to 48 where a search to characterise these 
regions' boundaries is earned out The search uses each region's area and centroid pixel 
location as obtained in connected component labelling at 38f, and each region is 
assumed to be a cell with a centroid which is the centre of the cell's nucleus. This 
assumption is justified for most cells, but there may be misshapen cells for which it does 
not hold: It is possible to discard misshapen cells by eliminating those with concave 
boundary regions for example, but this is not implemented in the present example. 

The search to characterise the regions' boundaries is carried out along the respective 
north, south, east and west directions (as defined earlier) from the centroid: ft is carried 
20 out in each of these directions for a distance S which is either 140 pixels or 
2^1 region area , whichever is the lesser. It employs the cyan image because experience 

shows that this image gives the best defined cell boundaries with the slide staining 
previously described. Designating Cy as the intensity of a region's centroid pixel in the 
cyan image at row i and column j r then pixels to be searched north, south, east and west 
25 of this centroid will have intensities in the cyan image of Cj+ij to G t4 &t> Cm j to Cwj, C y+ .i to 
and G\tf to respectively. The cyan Intensity of each of the pixels to be searched 
is subtracted from the centroid pixel's cyan intensity Cjj to produce a difference value, 
which may be positive or negative. In a cyan image, a cell nucleus is normally blue 
whereas a boundary is brown (with staining as described earlier). 



30 Each pixel is then treated as being part of four linear groups (or windows) of six, twelve, 
twenty-four and forty-eight pixels each including the pixel and extending from ft in a 
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continuous line north, south, east or west according respectively to whether the pixel is 
north, south, east or west of the centrold. Cm j for example is grouped with Cwz.j to Cuss, 
Cwy to G + iaj, Cwy to CWj, and Cwy to Cmbj (inclusive in each case). This provides a 
total of 165 groups from 45 groups in each of four directions. For each group the 
5 difference between each of its pixels' cyan intensities and that of the centrold is 
calculated: the differences are summed over the group algebraically (positive and 
negative differences cancelling one another) and divided by the number of pixels in the 
group to provide a net difference per pixel between the cyan Intensities of the group's 
pixels and that of the centnoid. 

10 For each direction, i.e. north, south, east and west, there is now a respective set of 45 
net differences per pixel: in each set the net differences per pixel are compared and their 
maximum value is identified. This produces a respective maximum net difference per 
pixel for each of the sets, i.e. for each of the north, south, east and west directions, and 
size of window (number of pixels in group) in which the respective maximum occurred. 

15 The four maxima so obtained (one tor each direction) and the respective window size in 
each case are stored. Each maximum is a measure of the region boundary (cell 
membrane) magnitude in the relevant direction, because in a cyan Image the maximum 
difference as compared to a blue cell nucleus occurs at a brown boundary. The window 
size associated with each maximum indicates the region boundary width, because a 

20 boundary width will give a higher maximum in this technique with a window size which it 
more nearly matches as compared to one it matches less well. Greater accuracy is 
obtainable by using more window sizes. A further option is to record the position of the 
maximum or boundary as being that of one of the two pixels at the centre of the window 
in which the maximum occurs: this was not done in the present example, although it 

25 would enable misshapen oelis to be detected and discarded as being indicated by 
significant differences in the positions of maxima in the four directions. 

The next step SO is to apply what is referred to as a "quicksort" to the four boundary 
magnitudes to sort them into ascending order. Quicksort is a known technique published 
by Kiette Ft., Zamperoniu P., 'Handbook of Image Processing Operators', John Wiley & 
30 Sons, 1996, and will not be described. For each image region, measurements made as 
described above are now recorded in a respective 1 -dimensional vector as set out in 
Table 1 below. 
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TABLE 1 



Element number 


Parameter 


0 


North boundary magnitude 


1 


South boundary magnitude 


2 


East boundary magnitude 


3 


West boundary magnitude 


I 4 


Region area (from 38f) 


5 


North boundary width 


6 


South boundary width 


7 


East boundary width 


a 


West boundary width 


9 


Sum of North, South, East and West 
boundary magnitudes 



A further quicksort Is now applied {also at 50) to the regions to sort them into ascending 
order of element 9 values In Table 1 above, i.e. sum of north, south, east and west 
boundary magnitudes. A subset of the regions is now selected as being those having 
5 large values of element 9: these most significant regions are typically the top one eighth 
of the subset of regions in terms of element 9 magnitude. From this subset of regions the 
following parameters are computed at 52, "grand mean", "mean range" and "relative 
range" as defined below : 

octile = one eighth of the number of regions in the subset 
1 0 boundaries = boundary magnitudes 

2 = sum of ... (over all regions in the subset) 
element 1= south boundary magnitude 



(8) 
(9) 
(10) 
(11) 
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element 3 = west boundary magnitude (1 2) 

grand mean = Sx[(S east boundaries) + (S west boundaries) 

+ (S north boundaries) + (S south boundaries)]/ 4ocSIe (1 3) 

mean range = [(£ boundary element 3) - (£ boundary element 1 )]/octile. (1 4) 

5 relative range = 1 0xmean range/grand mean (15) 

At 54 an overall distance measure is computed: this measure provides an estimate of 
how far the current cyan image (used to derive grand mean etc above) is from each 
member of a predetermined standard set of images, four images in the present example, 
in this example the distance measure is computed against a set of four predetermined 

10 standard images; the standard images were obtained by dividing a large test dataset of 
images into four different image types corresponding respectively to four different Cerb- 
B2 status indicators (as will be described fater in more detail). The images of each 
Image type were analysed to determine grand mean and relative range for each image 
as described above. A respective average grand mean M ? (i = 0, 1, 2 and 3) and a 

15 respective average relative range skew f were determined for the images of each of the 
four image types. As an alternative, it is also possible to select four good quality images 
of the relevant types by Inspection from many [mages, and to determine Mi and skewi 
from them. The values Mi and s/rewi become the components of respective four-elernent 
vectors M and skew , and are used in the following expression: 

20 Cerb-B2 indicator = min f {(M, - grand meanf + {skew, - relative range) 1 } (1 6) 

where min* is the value of i ( i ~ 0, 1 f 2 or 3) for which the expression within curved 
brackets { } on the right of Equation (16) is a minimum. For the vector M, from the 
dataset the following elements were determined: Ma = 12.32, Mi =23.16, M 2 = 42.34 and 
M 3 =^ 87.35); elements determined likewise for the vector skew were skem = 2.501, 
25 sfrev/ r =?1.85, skew s = 1.111 and sk&w 3 ~ 6.5394. The value of / is returned as the 
Indicator for the Cerb-B2 measurement process. 

If a value of i = 3 is obtained in the Cerb : B2 measurement process, this is regarded as a 
strongly positive result: the patient from whom the original tissue samples were taken is 
regarded as highly suitable for treatment, currently with herceptin. A value of / = 2 is 
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weakly positive Indicating doubtful suitability for treatment, and / = 1 or 0 is a negative 
result indicating unsuitability. This is tabulated below in Table 2. 

TABLE 2 



Cerb-B2 status 


I Value 


Strongly positive 


3 


Weakly positive 


2 


Negative 


0,1 



5 Referring now to Figure 3, there is shown a flow diagram of the process 14 (see Figure 
1) for measurement of vascularity. The process 14 is applied to two images each of x40 
magnification compared to the histopathological slide from which they were taken. At 60 
each image is transformed from red/green/blue (RGB) to a different image space 
hue/saturation/value (HSV). The RGB to HSV transformation is described by K. Jack in 

10 'Video Demystified', 2 nd ed,, HighText Publications, San Diego, 1996. In practice value V 
(or brightness) is liable to vary due to staining and thickness variations across a slide, as 
well as possible vignetting by a camera lens used to produce the Images. In 
consequence in this example the V component is ignored: it is not calculated, and 
emphasis is placed on the hue (or colour) and saturation values H and S, H and S are 

1 5 calculated for each pixel of the two RGB images as follows: 



Let M = maximum of (R f G t B) (1 7) 

Let m = minimum of (R,G,B) (1 8) 

Then newr = (M- R)/(M -m) (1 9) 

newg = (M - G)/(M - m) and (20) 

20 newb s (M ~ B)/(M - m) (21 ) 



This converts each colour of a pixel into the difference between its magnitude and that of 
the maximum of the three colour magnitudes of that pixel, this difference being divided 
by the difference between the maximum and minimum of (R,G,B). 

Saturation (S) is set as follows: 
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if M equals zero, then S = 0 (22) 

if M does not equal zero, then Sa(M» m)/M (23) 

The calculation for Hue (H) is as follows: from Equation (17) M must be equal to at least 
one of R, G and B: 



5 


if M equals zero, then H =180 


(24) 




If M equals R then H = 60(newb - newg) 


<25) 




If M equals G then H = 60(2 + newr - newb) 


(26) 




If M equals B then H - 60(4 + newg - newr) 


(27) 




If H is greater than or equal 360 then H = H - 360 


(28) 


10 


If H is less than 0 then H = H + 360 


(29) 



The Value V is not used in this example, but were it to be used It would be set to the 
maximum of (R,G,B). 

The next step 62 is to apply colour segmentation to obtain a binary image. This 
segmentation is based on thresholding using the Hue and Saturation from the HSV 
15 colour space, and is shown in Table 3 below. 



TABLE 3 



Threshold Criterion 


Binary Image Pixel Value 


Pixel with both Hue H In the range 282 - 356 degrees 
(scale 0 to 360), and Saturation S in the range 0.2 to 
0-24 (scale 0 to 1) 


Set pixel to 1 


Pixel with either Hue outside the range 282 - 356 
degrees, and/or Saturation outside the range 0.2 - 0.24 


Set pixel to 0 



This produces a segmented binary image. 



The next stage 64 is to apply connected component labelling (as defined previously) to 
the segmented binary image; this provides a binary image with regions of contiguous 
20 pixels equal to 1 , the regions being uniquely labelled for further processing and their 
areas being determined. The labelled binary image is then spatially filtered to remove 
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small connected components (image regions with less than 10 pixels), which provides a 
reduced binary image. 

The sum of the area of the remaining image regions in the reduced binary image is then 
determined at 66 from the results of connected component labelling, and this sum is then 
5 expressed as a percentage of the area of the whole Image. This procedure is earned out 
for both of the original RGB images separately to provide two such percentage area 
values: the average of the two percentage area values is computed, and it represents an 
estimate of the percentage of the area of a tissue sample occupied by blood vessels - 
i.e. the sample vascularity. 

10 As set out in Table 4 below, vascularity Is determined to be high or low depending on 
whether or not It is equal to at least 31 %. 
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TABLE 4 



Description of vascularity 


Range 


High 


31% -100% 


Low 


0%-30% 



15 High vascularity corresponds to relatively fast tumour growth because tumour blood 
supply has been facilitated, and early treatment is indicated. Low vascularity corresponds 
to relatively slow tumour growth, and early treatment is less important 

Referring now to Figure 8 f there is shown a flow diagram of a process 16 for the 
measurement of Oestrogen (ER) and Progesterone (PR). The process for measuring 
20 these two is the same except for their use of different immunohistochemical staining 
processes applied to tissue samples (slide colouring)* The process win be described for 
the case of Oestrogen receptor assessment With diaminobenzidine as staining agent 

A digitised image of a histopathologic^ slide or part of such a slide part can be 
characterised for its response to oestrogen, for any immunohistoohemfcal stain used. A 
25 user oan adapt the process to be described below by defining an appropriate reference 
colour, which must represent a sample with maximum intake of oestrogen. There are two 
main colours In the pixels of ER medical images, I.e. brown and blue. The same colours 
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appear in PR imagery, and so the image processing technique Is the same. In thfs 
example the process determines brown pixels in an Image that are similar to a reference 
brown colour defined by a user of the process. 

Referring now to Figure 4, processing 16 to determine ER status will be outlined and 
5 then described in more detail later. It begins with a pre-processing stage 70 in which a K- 
* means clustering algorithm is applied to a colour image using a Mahalanobis metric. This 
determines or cues image regions of interest for further processing by associating pixels 
. into clusters on the basis of their having similar values of the Mahalanobis metric. At 72 
the colour image is transformed into a chromaticity space which includes a location of a 
10 reference colour. Hue and saturation are calculated at 74 for pixels in clusters cued by K- 
means clustering- The number of brown stained pixels is computed at 76 by thresholding 
on the basis of hue and saturation. An ER status measurement Is then derived at 78 from 
a combination of the fraction of stained pixels and average colour saturation. 

The input for the ER preprocessing stage 70 consists of raw digital data files of a single 
15 colour histopathological image or tile. A triplet of image band values for each pixel 
represents the colour of that pixel in its red, green t and blue spectral components. These 
values rn each of the three image bands are in the range [0...255], where [0.0.0] 
corresponds to black and [255,255,255] corresponds to white. 

The K-means clustering algorithm 70 Is applied to the digital colour image using clusters 
20 and the Mahalanobis distance metric. A cluster is a natural grouping of data having 
similar values of the relevant metric and tha Mahalanobis distance metric is a 
measurement that gives an indication of degree of closeness of data Items to a cluster 
centre. It is not essential to use four clusters or the Mahalanobis distance metric but 
these provide an adequate subdivision of the data and a convenient metric. The K- 
25 means algorithm Is described by J. A* Hartigan and M. A. Wong, in a paper entitled 'A K- 
means clustering algorithm', Algorithm AS 136, in the Applied Statistics Journal, 1979. 
The Mahalanobis distance metric is described by F. Heijden, in 'image Based 
Measurement Systems - object recognition and parameter estimation', John Witey & 
Sons, 1994 and by R. Schalkoff f in Pattern Recognition - Statistical, Structural and 
30 Neural approaches', John Wiley & Sons Inc., 1992. The process comprises an 
initialisation step a) followed by computation of a covariance matrix at Step b). This leads 
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to a likelihood calculation at step c), which effectively provides the distance of a pixel 
from a cluster centre. Tha procedure is as follows: 

a) Initially, cluster centres are set at random locations and approximately equal 
numbers of pixels are arbitrarily assigned to each cluster for later readjustment 
5 b) For each cluster the following computations are carried out 

I) Compute elements of the kindo"£ of a covariance matrix of the image bands Indicating 
the degree of variation between intensities of different colours in pixels of each cluster I 

°t = ^Eta ~/**X«V (30) 

where: <r£ is the if 1 element of the covariance matrix, 
10 is the number of pixels in cluster k } 

Co and c?/y are the values of pixel tin image bands i&ntfi 

h j take values 1, 2, 3, which represent the red, green and blue image bands 

respectively, 

/if is the mean of all pixels in image band / belonging to cluster k, and 
"IS Uj is the mean of all pixels in image band j belonging to cluster k. 

il) Calculate the determinant of tha covariance matrix denoted as 

A* 

ft 

iii) Calculate the inverse of the covariance matrix denoted as - 

lav 

c). With / denoting pixel number, each pixel 3c. is now treated as a vector having three 
elements 'x^x^x^ which are the red (x £t ), green (x h2 ) and blue (jr i(3 ) pixel values: 
20 the red, green and blue image bands are therefore represented by second subscript 
indices 1 , 2 and 3 respectively. With / ranging over all pixels in a cluster k a the likelihood 
d L (x g )ct a pixel vector x s not belonging to that duster is computed from Equation (31) 
below: 
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where ^ and 2j are as defined above, 

det wv 

fi? is the mean of all pixel vectors jc, in cluster fc, and 
t indicates the transpose of the difference vector (x, - fi k ) - 

Equation (31) is re-evaluated for the same pixel vector x { in all other clusters also. Pixel 
5 veGtor x t has the highest likelihood of belonging to a cluster (denoted k^ for which 
rf*(jc f )has a minimum vaiue i.e. {d*- ft)}; cluster km is then the most suitable to receive 
pfxel 5,;i.e.find:- 

(4) < 4 fc fo) for all k * k m (32) 
Assign pixel x s to cluster k m 

10 

d). For each cluster k. 

Store a record of which pixels belong to cluster k as an array Jc*. update it with each pixel 
vector assigned to that cluster and update the number /V* of pixels In that duster. 

Calculate the cluster centre jj. * for each image band j = 1 , 2 and 3 

15 ^J-J-^* < 33 > 

iterate steps b) to d) until convergence, i.e. when no more pixels change clusters or the 
number of iterations reaches a total of 20. 

The first cluster (k = 1) now corresponds to cell nuclei and the corresponding pixel 
vectors are those which are cued as of interest for output and further processing. 

20 

Transformation of the image at 72 from red/green/bfue (RGB) to hue/saturation/value 
(HSV) is as described by K. Jack in 'Video Demystified', 2"* ed., HighText Publications, 
San Diego, 1996. In practice value V (or brightness) is liable to vary due to staining and 
thickness variations across a slide, as well as possible vignetting by a camera lens used 
25 to produce the images. In consequence in this example the V component is ignored: it is 
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not calculated, and emphasis is placed on the hue (or colour) and saturation values H 
and S. H and S are calculated for each pixel of the RGB image as follows: 

(a) Referring now also to Figures 5 to 8 f each RGB image is transformed into a 
chromaticity space. Figure 5 shows an RGB cube 100 in which red, green and 
5 blue pixel values (expressed as R, G and B respectively) are normalised and 

represented as values in the range 0 to 1. These pixel values are represented 
on red, green and blue axes 102, 104 and 106 respectively. The chromaticiiy 
space is a plane 108 for which R+G+B = 1: It is triangular within the RGB cube 
1 00 and passes through the points (1 ,0,0), (0,1 ,0) and (0.0. 1 ). 

10 (b) Rgure S shows the axes 102, 104 and 106 and ohromaticrty space 108 looking 

broadly speaking along a diagonal of the RGB cube 100 from the point (1,1,1) 
(not shown) to the origin (0,0 t 0) now referenced O for convenience. The points 
(0,0,1), (0,1,0) and (1,0,0) in Rgure 5 are now referenced J, K and L 
respectively. D is a midpoint of a straight line between J and L Image pixel 

16 values from the input RGB image are projected on to the chromaticity space 

108 and the resulting projections become data points for further processing. 
The projection calculation is as follows: 

Red green and blue pixel chromaticity values r, g and b respectively are 



20 



defined as:- r= n * g=-— ^ , and b = (34) 



Perpendiculars from a point P in the chromaticity space 1 08 to the lines JK and 
LD meet the latter at E and G respectively. Perpendiculars from P and G to the 
plane JOK meet the latter at F and H respectively. Using Equations (34). the 
point P in the triangular chromaticity space 108 may then be defined by x and 
•25 y co-ordinates shown in Rgure 6 and given by: 

x = £>E = HF =£^J- and y = PE = GD=b^ (35) 

(c) In Rgure 7, the chromaticity space 108 is shown with x and y co-ordinate axes 
extending from an origin Q. A reference colour denoted by a point 5 in the 
drawing is now defined as that specified for this purpose by a clinician: it is the 
30 colour of that part of the image which is most positively stained {the most 
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intense colour on the part of the original slide from which the image was 
taken). The reference colour's RGB components are taken from the [mage and 
its x and y co-ordinates are computed using Equations (34) and (35): these co- 
ordinates are denoted as (£,>). 

5 (d) In Figure 8, a polar co-ordinate system (r 4 6) is now defined on the (R+G+B=1) 

pJane 108. The co-ordinate system origin is the centre of gravity G of the 
triangle 109- A reference direction for 6 = 0 is defined as the direction QS of 
the radius vector to the reference colour S in Rgure 7. For any point such as P 
on the triangle defined as having co-ordinates fay) in the HSV colour space, 
10 hue H is defined as the angle tf> between the radius vector (e.g. QP) to itself 

and the reference colour direction QS. This is computed from the following 
expressions for tj>: 



- jl xy-xy , , xx + yy 

sin 0 = t . and cos0 = — 



i \xy-xy\ 
and the angfe # is defined to be sin 1 



15 For convenience the definition of hue H is now altered somewhat to render all values 
positive and in the range 0 to ti/2: the transformation of earlier values 4> into a new 
version v is shown in Table 5 below: 
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TABLE 5 



Condition 


Magnitude of ijr (New Hue H) 


sin $ > 0 and cos <j> > 0 




sin $ > 0 and cos §< 0 


it- (J) . 


sin <j>* 0 and cos <j> > 0 


-+ 


sin <j>< 0 and cos $ < 0 





A hue (H) threshold v 0 Is set by a user or programmer of the procedure as being not 
more than n/2, atypical value which might be chosen being 80 degrees. Saturation Sis 
5 defined to' be 

saturation — 



x + y 



Two values of saturation threshold S 0 are set according to whether or not image pixel 
saturation value Slies in the range 0-1 to 1 .9: this Is set out in Table 6 below: 



TABLE 6 



Saturation 8 


So 


Either S< 0.1 orS>1.9 


0 


0.1 <S S< 1.9 


o.g 



At 106, the thresholds are used to count selectively the number Nb of pixels which are 
sufficiently brown (having a large enough value of saturation) having regard to the 
reference colour. All H and S pixel values in the image are assessed. The conditions to 
be satisfied by a pixel's hue and saturation values for it to be counted in the brown pixel 
1 5 number N b are set out in Table 7 below. 

TABLE 7 
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Condition 


Action 


For each pixel with both hue modulus 
H < Wo and saturation S>S 0 


Treat as a "saturated" pixel; 
increase count Afe of brown 
pixels by 1 


For each pixel with \y\ > ip u andfor 
saturation SSS 0 


Treat as an "unsaturated" 
pixel; leave N D unchanged 



The average saturation of the Nt saturated pixels determined in Tabfe 7 is computed by 
adding all their saturation values S together and dividing the resulting sum by N#. The 
maximum saturation value of the saturated pixels is then determined, and the average 
5 saturation is expressed as a percentage of this maximum: the average saturation is then 
accorded a score at 78 of 0, 1, 2 or 3 according respectively to whether this percentage 
is (a) £ 25%, (b) > 25% and s 50%, (c) > 50% and <, 75% or (d) > 75% and s 100%. 

The fraction of saturated pixels - those stained sufficiently brown - is computed at 78 
from the ratio Afe/W where N is the total number of pixels in the image. This fraction is 
1 0 then quantised to a score in the range 0 to 5 as set out in Table 8 below. 

TABLE 6 



NtJN : Fraction of image 
pixels that are stained 


Score 


0.00 


0 


<0.01 


1 


6.01 -0,10 




0.11-0.33 


3 


0.34-0.66 


4 


0.67-1.00 


5 



The two soores determined above, i.e. for average colour saturation and fraction of 
sufficiently brown pixels are now added together to give a measure in the range 0 to B. 
The higher this number is, the more oestrogen (ER) positive the sample is. 
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Description of ER status (ER Score) 


Range 


Strongly positive 


7^8 


Positive 


4-6 


Weakly positive 


2-3 


Negative 


0-1 



Women with an ER score of 7 or 8 will respond favourably to hormonal treatment such 
5 as Tamoxifen; women wfth an ER score in the range 4 to 6 wifl have 50% of chance of 
responding to thfe treatment. Women scoring 2 or 3 will not respond very well, and those 
scoring 0 or 1 will not respond to hormonal treatment at all: 

Images for ER and PR are indistinguishable visually and they are distinguished by the 
fact that they are produced using different stains. A PR score is therefor© produced from 
10 stained slides in the same way as an ER score described above. The significance of 
progesterone receptor (PR) posrtivity in a breast carcinoma is less well understood than 
the equivalent for ER. In general, cancers that are ER positive will also be PR positive. 
However, carcinomas that are PR positive, but not ER positive, may have a worse 
prognosis. 
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Fram RGB image 
Compute a Cyan image 
as (2B+GV3 



30 



Apply Sobel filter to Cyan 
image 

32 



Compute mean and 
standard deviation on 
Sobel filter output 



34 



Compute statistical 
measures on Red, 
Green, Blue and Cyan 
images 



Colour Segmentation 
Process 



38 



3 iterations 



Adaptive thresholding 
a 

~t — 



Excess pixel removal 
processes b ,^d 



Morphological closing 



Connected component 
labelling f 



Apply a Downhill Simplex 
optimisation ^ 




Set grand mean and 
mean range = 1 4§ 



Boundary analysis 



4§ 



Sort Boundary data 



50 



Compute grand mean 
and mean range from 
boundary analysis 



Compute output measure 

54I 



Figure 2 
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Apply color segmentation using the 
Hue and Saturation image planes 



Connected component labelling 
and reject unsuitably small regions 
64 



Measure area of remaining regions 



Figure 3 Vascularity process 



Apply K-means clustering using 
Mahalanobis metric. 



70 



T 



Tratnsfomi image Into chromaticity 
space having reference colour 



72 



Compute Hue and Saturation for cued 
points ol Interest 74 



Compute amount of brown stained 
pixels by thresholding from Hue and 

saturation jq 



ER or PR measurement based on 
fraction of stained pixels and average 
colour saturation 73 



Figure 4 ER & PR measurement 
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